Building a peer-to-peer full-text Web search engine with highly discriminative keys
نویسندگان
چکیده
Web search engines designed on top of peer-to-peer (P2P) overlay networks show promise to enable attractive search scenarios operating at a large scale. However the design of effective indexing techniques for extremely large document collections still raises a number of open technical challenges. Resource sharing, self-organization, and low maintenance costs are favorable properties of P2P overlays in the perspective of large-scale search, but we also face new problems due to potentially huge bandwidth consumption during both indexing and querying, as well as the unavailability of global document collection statistics. Since a straightforward application of P2P solutions for Web search generates unscalable indexing and search traffic, we propose a novel indexing technique which maintains a global key index in structured P2P overlays. Keys are highly-discriminative terms and term sets that appear in a restricted number of collection documents, thus limiting the size of the global index, while ensuring scalable search cost. Our experimental results show reasonable indexing costs while the retrieval quality is comparable to standard centralized solutions with TF-IDF ranking. Our indexing scheme represents a contribution toward realistic P2P Web search engines that opens the opportunity to virtually unlimited resources, well beyond the capacity of today’s best centralized
منابع مشابه
Using Highly Discriminative Keys for Indexing in a Peer-to-Peer Full-Text Retrieval System
Excessive network bandwidth consumption, caused by the transmission of long posting lists, was identified as one of the major bottlenecks for implementing distributed full-text retrieval in a Peer-toPeer (P2P) architecture. To address this problem we introduce a novel approach to indexing using highly discriminative terms and term sets, which leads to short posting lists and therefore reduces t...
متن کاملBeyond Term Indexing: A P2P Framework for Web Information Retrieval
Web search over peer-to-peer (P2P) networks shows promise to become an alternative to the state-of-the-art search engines since P2P overlays offer means for decentralized search across widely-distributed document collections. However, the design of effective techniques for P2P indexing and retrieval raises a number of technical challenges due to potentially unscalable resource (e.g. bandwidth, ...
متن کاملSEARCH ENGINE IN LARGE - SCALE PEER - TO - PEER SYSTEMS by AKSHAY LAL
LAL, AKSHAY. Dgoogle: A Full-Text Search Engine in Large-Scale Peer-to-Peer Systems. (Under the direction of Professor Khaled Harfoush). Full-text search engines like Google serve an important role in accessing Internet resources. In such engines, a search for web pages, matching a user’ s query, are typically carried on a set of co-administered, physically co-located clusters of servers. Full-...
متن کاملIntegrating RDF Querying Capabilities into a Distributed Search Infrastructure
The Semantic Web is inherently distributed, and covers both metadata and full-text information. Semantic search therefore can profit a lot from peer-to-peer infrastructures as well as from powerful metadata search functionalities based on full-text search technologies. In this paper we focus on an approach extending an existing P2P search infrastructure with RDF querying capabilities, which bot...
متن کاملTowards large scale peer-to-peer web search
Web search engines, such as Google and Yahoo, are based on the centralized database model. Search engines using the centralized database model suffer from a several drawbacks, such as: they have a single point of failure, a limited representation of the web, their index is not up-to-date, and scalability. Currently a lot of research is being done on using peer-to-peer (P2P) technology for the u...
متن کامل